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The KHAC Index Generation Program was implemented at 
tbe state University of New York at Buffalo. It consists of only 252 
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KMAC program is essCTitialiy a KtilC ihd^ generated designed for a 
special purpose for use in a particular course, though it has 
sufficient flexibili^ to be used in other, similar contexts. The 
program takes free form natural language input, and generates an 
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inpuit alongside which appears the bibliographical description of the 
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a;re permitted, and index page numbers are assigned and printed 
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does not appear in a user-input stop list and provided it begins with 
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2) an otheacwise significant word may be eliminated from the indexing 
by a simple procedure at the time of input. , (Author/SJ) 
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I. INiaODUCTION 

The KHAC (Key Word Alongside of Context) Index Generation 
Program was written ana debugged by Lester A. Rice* at the School 
of Information and Library Studies » SUNY at Buffalo , during the 
winter 1972*73. The purpose of the program was primarily pedagogical. 
It was developed specifically for LI 561 « Information Storage, 
Retrieval » and Selective Dissemination Systems where it was used 
successfully during the Spring Term 1973. The intention was to 
produce a group of indexes to the same data base for the purpose 
of coiiq>arative evaluation of their performances in an experimental 
environment. ' The corpus indexed at that^time was 36 :>ubstantive 
articles appearing in the Journal of the American Society for 
Information '-Science > v. 22 (1971). 

- KHAC is a KRIC^type index written for the specific purpose 
indicated above » and consequently has some rather severe input and 
output limitations. It is acknowledged that these limits reduce 
its general usefulness and while it is not particularly difficult 
to' expand them for more general use» thi^ has not been done for 
two reasons: 1) the ready availability of existing program packages 
with considerably more flexible capability; and, 2) the simplicity 
and proven effectiveness of the KlfAC index for the purposes intended. 



Dr. Rice is currently enqployed in the Reference Department, University 
Libraries, University of Pennsylvani t, Philadelphia, Pennsylvania 19104. 



II. PROGRAM DESCRIPTION 

The kwAC Index Generation Program Version 3.0 was implemented 
at the Computing Center of SUNY at Buffalo on May 10, 1973. It 
consists of only 252 statements for the COBOL Compiler Edition 
V310222 on the CDC6400 coiiq>uter under the KRONOS operating system 
used at Buffalo. Ibere should be few, if any, problems implementing 
the program at other installations which have a COBOL compiler.* 
The program is used only In the batch mode. 

The KWAC program is essentlaily a IC index generator designed 
for a special purpose for use in a particular course, though it has 
sufficient flexibility to be used in other, similar contexts. 
Basically, the program takes free form natural language input » and 
generates an index in alphabetical order of each significant word 
appearing in the input alongside which appears the bibliographical 
description of the document in which the word was located. Output 
options permit free form title page, introduction, and epilog. Page 
headers and footers are permitted, and index page nundbers are assigned 
and printed automatically. Determination of significant words is 
made in two ways: 1) a word is determined to be significant by 
default, if it does not appear in a user^input stop list and provided 
it begins with an alphabetic character and it is more than one 
character long; and, 2) an otherwise significant word may be eliminated 
from the indexing by a simple procedure at the time of input. 

Input procedures are straightforward and students with minimal 

1^ Anyone desiting a program listing and/or deck should make arrangements 
-with the author of this User's Guide at SILS/SUNY at Buffalo. 



key punching experience have been able to do them with very little 

difficulty^ V'.ille no cost studies have been undertaken; for the 

one use already made, CPU time required for the whole index generation 

was less than 11 seconds. In this case there were 36 pages of index 

output consisting of a total of 318 index entries, generated from 

36 documents* The cost of preparation of the first copy of the index 

(exclusive of input costs) at our installation was $1.87. 

III. INPUT REQU.IREBENTS 

III. 1* Each document in the corpus toi>e. indexed must have exactly three 
80*column cards for input*. One or two of these cards may be 
blank if 240 characters of input are not required, but each 
document must be represented by three cards • 

III. 2- Each set of three cards may be thought of as a single 240 column 

"supercard". Free form data may be punched beginning at any point, 
but it is recommended that the first column be used as uneven left 
margins may occur in the printed output otherwise. The data on 
the three cards will be printed exactly in the same order (i.e. 
spelling, line endings, etc.) as the "context" beside each index 
entry generated from the document description. Thus it is 
essential that they be arranged in the correct order. Care should 
be taken that words not be hyphenated at the end of a card (i.e. 
columns 80 and. 160). Further, if a word ends in coiutmi 80 or 160, 
a blank should be left in the first column of the card following, 
otherwise the two words will be Joined and no index entry will be 
generated for the second word. 

III. 3* The IBM 029 Keypunch and the CDC 6400 Printer do not. always use 



the same code to represent the same graphic symbol (e.g., the 
* IBM 029 code is translated to "a " by the 6400 Printer). 

While this is not often a problem with bibliographic data, it is 
reconmended that appearing in a document title be translated 

and input as the character strings AND. 
III. 4* The program will not generate entries for the following character 
strings: 

a- Those which are only one character long 

b- Those which begin with any non-alphabetic character excepts the 
colon, : 

The printing of character strings longer than 30 characters will 
be truncated. at that point. 

III. 5- For improved readability and to link character strings that are by 

convention orthographically separate, ^ but which are semantlcally linked 
(e.g. LOS ANCELES), it is recommended that a hyphen be placed between 
the strings instead of a blank (e.g. LOS-ANGEUSS). This can happen 
in columns 60 and 120, Paragraph 2 above notwithstanding. 

III. 6- Ihe standard COBOL sort order 1$ used to arrange the index terms 

_ Which, are generated. However as only significant words are indexed, 
there will be none generated for character strings beginning with 
blanks 9 numbers, or special symbols with the exception of the colon 

: In the COBOL sort the colon has low order; thus, for 
practical purposes the entire sort order for the first character in 
each string is A, B, C. ..Z,:. The low sort order of the colon may 
be exploited for several purposes. For example, it is^ possible 
to preceed each author's natne with a colon as it is input. 
This results in all of the non-author names being generated and 
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sorted at begliming of the printed index, followed by all 
of the author's names (each preceeded by a colon) in a separate 
alphabetical sequence The sort goes as far into the character 
string as necessary to insure complete and perfect ordering, 
thus the authors* names are printed in alphabetical order, just 
as are the title words. Similarly, if desired, some other 
^bibliographic element (e.g., the journal title) could be preceeded 
by two colons (i.e. " :: ") to cause a third separate alphabetical 
sequence, and so on as many^times as necessary. It has been 
observed that the colon preceeding an author's name is not very 
obtrusive and that it does not seriously interfere \vith the 
scanning of a list of such names. 

III. 7- Any character string may be made non-significant simply by pre- 
ceeding it with any non-alphabetical character except the colon (see 
Paragraph 6). However, it is recommended that the logical not 

always be used for this purpose to exploit a special feature 
of the program* Any character punched in the input cards will 
be printed exactly as it was punched (see Paragraph 2)> with one 
exception: the logical not and then only when a further 

condition is met. la any card on which a logical not 
appears, and where It is not desirable to have it printed, it 
can be suppressed (i.e., replaced by a blank) simply by putting 
another logical not in the last column of that card . When 
this is done,' the printing of both of them is suppressed. 

Ill* 8- Many other modifications of an additions to the raw bibliographic 
data can be made before input in order to produce a better index. 
Jxidexes of the KWIC family are "quick and dirty*^ and in this 



statement is revealed both their great virtue and their great 
fault/ They may be made quickly and inexpensively; but, un- 
fortunately they are. far from perfect retrieval tooJ.s and are 
ordinarily used only where these two characteristics are demanded. 
If better retrieval tools are needed, it is recommended that 
another indexing method be used. Consequently only the index 
improvements mentioned above should be made. 

IV. DECK MAKEUP TO GENERATE A KMAC INDEX 
All assembled deck to generate and-print a KtfAC index consists 
of the following parts: 

1- Job control cards, 

2- KHAC control cards, and 

3- Data input cards. 

IV. A. JOB CONTROL CARDS 
Job control cards are computer installation specific. Do 
whatever is necessary to have the job accepted and 'zo invoke the 
COBOL compiler. U&e of the program thus far has been satisfactory 
with a field length of 55K; but this is dependent upon the amount of 
space required for the COBOL sort which is in turn dependent upon 
the nuniber of index entries generated. (See Appendix A for the ^ 
appropriate SUNY at Buffalo Job control cards). 

IV. B* KMAC CONTROL CARDS AND DATA INPUT CARDS 
The data input cards and the control cards for the KWAC program 
are not separated from each other. The following list of control 
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and data cards is arranged in the proper order for the making up of 

a deck for getferating a KVAC index. - 

Control cards contain one or more characters and must begin in 

column 1 of each card. They may extend as far as necessary to the 

right with tl^ exceptions noted below. Data i*^ ;>ut cards may have a 

free form input with the exceptions noted wit' 'n their descriptions 

below* On control cards, nusobers must fill columns 1-3^ with leading seros 
supplied if necessary* 

IV« B. 1^ PREFACE LINE SPACING COHTROI^CARD indicates the number of blank 

* . ^ - ' _ - 

lines to appear between each line of printing in the preface. 

• ^ - - 

IV. B* 2- EPILOG LINE SPACING CONTROL CARD indicates the nunber of blank 

lines to appear between each line of printing in the epilog. C\ 

IV. B. 3- BEGINNING OF PREFACE MARKER (*) is required. An asterisk must 
appear in column 1, eirsn if there is no preface. 

IV. B. 4- PREFACE DATA INPUT CARDS must appear iti pairs. As the preface 
will be printed exactly as it is input, certain' precautions must 
be observed ,in the punching of this data. A card has 80 columns ^^w— ^ 
but the width of the printed line of output is limited to 136 
characters* Accordingly » for the purposes of centering the data 
ot the preface in the printed output^ it is necessary to think 
of the midpoint of the printed line as falling between column 
68 and 69 of the first of two cards. For exaiqple, if it were 
desired to have the word PREFACE centered at the top of the first 
— pag^^^^^j^J;^,^^ following procedures should be used* The 

word PREFACE contains iieven characters.. Subtract 7 from 136, 
yielding 129. Divide by 2 yielding 64.5; thus, the word PREFACE 
should begin in either colum 64 or 65 for approxlsuite centering* 
Other lines may be centered and left-or right- Justified by 



•inilar nethodt. 

IV. B. 5- ENDING OF PREFACE MARKER (*) It required, even l£ there Is no 
l^reface. 

IV. B. 6- I gADER/FOOTER OPTION CONTROL CARD . A pound elgn (#) l« required 

if headers and footers are to be printed on each page of the index. 
If this option is not chosen, a blank card wust be used in its place. 

IV. B. 7- HEAWR/FOOTER IHPDT DATA CARDS Must be used if there is a # in colunn 
1 of the preceedtnt card. If used, there aust be exactly two cards 
with data entered as follows. The vord PAGE and the page nmbmt 
are autoaatically generated and printed at the center of both 
the top and the botton of each index page. As each of the two 
cards is printed exactly as the data is entered into the cards, 
care should be taken at input to insure appropriate spacing of the 
printed output. 

IV. 0. 8- HUMBER OF STOPWORDiS COHTROL CARD wist be used to specify the nuaber 
of stopwords to be input (i.e., the nunber of words to be declared 
non-significant throughout the entire index generation). The 
naxiaRM nuaber that may be declared is 153. If aore are needed, 
use the method described in Paragraph III. 7. above. A stopped 
word can be unstopped by addJAg two periods (..) immediately after 
it (i.e., creating a non-stopword character string). This is 
required because the progrM automatically searches for a punctuation 
mark at the end of each word and strips it off. 

IV. B. 9- STOPWORD LIST INPDT DATA requires as many cards as the nuaber 

specified on the preceeding card. Only one stopword may be entered 
on each card, and the first letter must be in column 1. The 
maximum length of any stopword is 20 characters. 



IV. 10- BIBLIOGRAPHIC DATA IHPOT CARPS W bft in frte form fomat, but 

••t Paragraph III. 2. abovn for auggtatioos^ Each logical rac^rd 

aaiat conaiat of 3 phyaical carda, with blank carda uaad if necaaaary* 

» 

IV. B. 11- EMD OF PATA MARKER (*) in coltan 1 mmt follow th^ laat bibliographic 
data input card. 

IV. B. 12- EPILOG IHPUT PATA CARDS hava axactly tba aaaia format r^quiravanta 
aa tba prafaca input data carda diacuaaad in Paragraph IV. B. 4. 

' ab "a* > 
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APPENDIX A 



JOB CONTROL CARDS REQUIRED AT SUNY AT BUFFALO 
(KRONOS version 2.0.9 on a CDC 6400) 



Column 1 

4 



BATCH,T^0,F«550O0,P«3O,R>«. CONAWAY— JOB NAME. 
LISCHAS , CQKAWAY, PASSWORD . 

OOBOLaR) 
REDUCE, NO. 
LGO. 
7-8-9* 

(HVAC Source Deck) 
7-8-9* 

(Preface Line Spacing Control Card) 
(Epilog Line Spacing Control Card) 
(Beginning of Preface Marker) 
(Preface Data Input Cards) 
(Ending of Preface Marker Card) 
(Header/Footer Option Control Card) 
(Header /Footer Input Data Cards) 
(Number of Stopwords Control Card) 
(Stopword List Input Data Cards) 
(Bibliographic Data Input Cards) 
(End of Data Marker) 
(Epilog Input Data Cards) 
7-8-9* 
6.7-8-9* 



APPENDIX B 
Sample Output Pages 
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IMOCXtNC 
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tNOniM 



KRTWt WOCX TO J«SIS V22 »"7tl 



iSSirm/cSio; or Tjo ihocxwc hcthoos usi-g .««cs 

! 



AfRtt SITMt 



PAGC if 



t^tfit tHOtn TO jAStS V22 119711 



AHItL 30TH, 1973 



ttJOHieONtCtAXKE ttBffXCCStaEANOitE 

MV?ss?s:v2i.trnj?^j?ii5r^ ^^^^^^ ^ystehs 



ttLEIN(UHLERtFCROlNANO*r ttCOOFCK«||TCHAEt*0 
ANALVirCAL MOOEtS FOR LXBRARY ^LAHllllic 
ttlfjlSXStV22tl9fl,f»398*39«l 



CirAVENSJDAVXO-N 



ttCRAWNStOmO*!! 

^REOXCrX»iC PERFORMANCE OF XNFORNArtON SKCXALXSTS 
ttt|jRSis,V22,19fl,PH5*llll »rcu*«.A»i> 



€RANro»0 SUSAM 



ttCRAiro»OtSUSAII 

tNFORKt COMMOWlCAnOM AMONG SCXEHTtSTS IN StEEP RESEARCH 
ltt|jlSXStV22»1971tP3ll*3iei 



MAPHON 6LYNM 



HELinjTH HANCY-A 



ttHARNlNt GLYNN 

OPXMXOt PAPER ON THE EVOLUTXON OF XNFORNATXON SCXENCE 
ttlUABIS,V'>2,19n«P23S*2%ll 



ttHFtNITHtNANCY^A 

THE USE OF EXTRACrs XN XNFORNATXON SERVICES 
ttt(JA5XStV22tl9fl,P3S2-389l 



HtUI.'GCR CtAUOE 



MKRAIKE.TAOEUSZ-K ttHXtLINCEIItCLAUOE 

SII!!2'*!^!r"^^^5 GROHfH OF SCXENTXFXC LtTERATUREA HOOEL OF 

nYNAKB XNTCWACTION ttMJASXSfV22tl971tP333-336l 



ttHOL^-a-E 

FXO COIffXTTEE XNFORHATXON FOR INDUSTRY CFXO/XXI 
1 1 1 ( JlPXStV22tl9fltP%M*%l|t 



JACKSON E*9 



ttJAClSnNte*8 

FXO AT aUENOSf»AXReSt SePTOHOER tk TO 2%« 197« 
tttCJI5XS,V22tl9PitP06M 



1 
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APRXL 3ITH« 1979 
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APPENDIX C 



Sample Bibliographic Data Input 
Card Listing 



X tMC AIL ISTrR.CftRYt tt BELL tJOHN..M 

HUMAN FACTORS IN THE DESIGN OF AN INTF«»ACTIVF LIBRARY SYSTEM 
IS:CJ/SIStV22,1971,P096-10«») 

t'?SHOFNERt RALPH-M 

SOME ^PLICATIONS OF AUTOMATIC RECOGNITION OF BIBLIOGRAPHIC liLEHEhiJS 
Jt tt.JASIS,V22,1971,P275-282f 



JJWlLII'AMStJ-H 

FUNCTDNS OF A MAN-MACHINE INTERACTIVE INFORMATION RETRIEVAL SYSTEM SOME IMPLIC 
TIONS 3F AUTOMATIC RECOGNITION OF BIBLIOGRAPHIC ELEMENTS t t lNJASISty22,19?l,P31 

?1 PAISLEY, WILLIAM 

IMpwovtNG A FIELO-BASED ERIC-LIKE INFORMATION SYSTEM 
tt:(JASIS,V2?,19?'l,P399-«f08) 



tt.'?nSENnERG,VICTOR 

A STUr«- OF STATISTICAL MEASURES FOR PREOICTING TERMS USED TO INOEX COCUMEMTS 
t » t < J ASTS, V22,1971,PO<»i'-0501 



MAULC, LARRY 

KWOC NOEXES a VOCABULARY COMPARISONS OF SUMMARIES OF LC aQC 
CLASSFICATION SCHEDUALS 1 1 1 ( JASIS,V22» i')/l,P322-325) 

: t8LAN«C€N,R0BERT-R THE PREPARATION OF I NfERNATIONAL AUTHOR INDEXES, WITH 

PARTICULAR REFERENCE TO THE PROBLEMS OF TRANSLITERATION, PREFIXES, AND COMPOUND 
FAMILY. NAMES tt t ( JASIS,V22,l971,P051-06i) "^trxAca, mwu uuhkuunu 

ttBLUVFREO 

TWO M/CHtN,E INDEXING PROJECTS AT THE CAIHOIIC UNIVERSITY OF AMERICA 
tt tlJA3IS,V22,1971,P105-106l 



t tROSETBERG, VICTOR 

COMPAQ TIVE EVALUATION OF TWO INDEXING METHODS USING JUDGES 
«lt(a-ASIS,V22,1971tP251-259> 



APPENDIX D 
KNWIN BUGS IN KWAC (JULY 24, 1973) 

The footer, but aot the header, loses character in coltuon I in 
the printed index. To avoid the problem, begin the header/ 
footer input data in column 2« 

A character string ending in column 80 of the last card of 
a set of 3 bibliographic data input cards will not be indexed. 
Avoid the bug by always leaving the last column of the last 
card blank. 



